-
Notifications
You must be signed in to change notification settings - Fork 49
Attempt to optimize LocalExecutor #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hmmm, VecDeque::new is not const until Rust 1.68. It could be trivially replaced with LinkedList which has a const constructor in 1.63, but it would be a major regression in performance. |
There is no need to be worried about the MSRV of const VecDeque::new, because we will be able to raise the MSRV after about two weeks: smol-rs/smol#244 (comment) |
OK, did a more comprehensive benchmark run and it's pretty clear that the
This PR should be ready for review now. |
I don't think we can reasonably make |
d206459
to
13fbe6f
Compare
Spoke too soon, added an early-out when the queue is empty to
|
@james7132 Can you rebase on master? |
13fbe6f
to
c4f077f
Compare
Done! |
#[inline] | ||
pub(crate) fn state_ptr(&self) -> *const State { | ||
#[cold] | ||
fn alloc_state(cell: &Cell<*mut State>) -> *mut State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we delay the allocation of the state? How beneficial is this from a performance perspective (given that it trades a potential "away optimization" of state allocation in case the LocalExecutor
gets destroyed without usage for a branch on every state access (minus a few where the compiler can perhaps infer that the state is non-null / initialized))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows LocalExecutor::new
to be const, without it this would be a breaking change and would require a major version bump.
Exexutor
has this as well for similar reasons and has an AtomicPtr
instead of a Cell
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, this comment makes me realize that the State in both satisfies all of the safety invariants of making the state be a Pin<Box<State>>
instead of the reference counted types, and we can avoid the reference counting even without 'static borrows like the Static variants.
As far as I know this holds true even for the instances of either that get leaked into their Static variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm aware, this sounds correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, for the next major version bump it would be nice to pursue a version which doesn't use the internal Cell
wrapping, at least for the LocalExecutor
version, where "difficulty of sharing across threads" isn't that much of a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and implemented the pin change, and filed #146 for the non-local version.
2b5adca
to
d0a1f9e
Compare
fn spawn_inner<T: 'a>( | ||
state: Pin<&'a State>, | ||
future: impl Future<Output = T> + 'a, | ||
active: &mut Slab<Waker>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has one aspect which bothers me:
It gets passed an active: &mut Slab<Waker>
, but in its body it also constructs additional unsafe { &mut *state.active.get() }
, violating the "exclusive / strictly nested" property of &mut
references, which is undefined behavior both because it violates Rust's memory aliasing model, and as rustc annotates functions appropriately when encountering &mut
in arguments, also can cause LLVM to deduce wrong properties about the function.
The places where such additional bad references are constructed is in the closure for future
below, which e.g. would be invoked in the case of panics in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this problem also exists for the normal Executor::spawn_inner
, but in this case the problem was plainly more obvious.
// Remove the task from the set of active tasks when the future finishes. | ||
let entry = active.vacant_entry(); | ||
let index = entry.key(); | ||
let future = AsyncCallOnDrop::new(future, move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if Builder::new().propagate_panic(true)
could be saved in a binding above this line.
It is not really necessary, but it makes it obvious that no panics can occur between the construction of AsyncCallOnDrop
and the call to .spawn_unchecked
, at which point the handling of drop unwinding gets taken over by Task
, which automatically schedules itself in that case and notifies this LocalExecutor
.
Note that entry.insert
might allocate, and subsequently panic on OOM.
I think the easiest way to prevent panics in this function in a sort-of "critical section" w.r.t. state.active
would be to invoke active.reserve(1);
at the top of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good catch! Unwinds continue to be the one control flow path that seems to be hard to find.
I went ahead and moved the builder outside of the "critical section".
reserve
addresses the allocator OOM from the Slab, but it may also occur in the schedule
with the VecDeque
and adds extra work on the hotpath. I'd rather use a more robust / zero cost method and just opted to put an AbortOnPanic
guard around the critical section. Rust's drop order should force it to run before the AsyncCallOnDrop is dropped.
Co-authored-by: Ellen Emilia Anna Zscheile <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM.
but as this is a complicated change, it should still be reviewed by others.
cc @smol-rs/admins
This PR makes
LocalExecutor
it's own separate implementation instead of wrapping an Executor, forking it's ownState
type utilizing unsynchronized!Send
types where possible:Arc
->Rc
,Mutex
->RefCell
,AtomicPtr
->Cell<*mut T>
,ConcurrentQueue
->VecDeque
. This implementation also removes any extra operations that assumes there are other concurrent Runners/Tickers (i.e. local queues, extra notifications).For testing, I've duplicated most of the single-threaded compatible executor tests to ensure there's equivalent coverage on the new independent
LocalExecutor
.I've also made an attempt to write this with
UnsafeCell
instead ofRefCell
, but this litters a huge amount ofunsafe
that might be too much for this crate. The gains here might not be worth it.I previously wrote some additional benchmarks but lost the changes to a stray
git reset --hard
when benchmarking.The gains here are substantial. Here are the results: